Building Text-To-Speech Voices in the Cloud
نویسندگان
چکیده
The AT&T VOICEBUILDER provides a new tool to researchers and practitioners who want to have their voices synthesized by a high-quality commercial-grade text-to-speech system without the need to install, configure, or manage speech processing software and equipment. It is implemented as a web service on the AT&T Speech Mashup Portal. The system records and validates users’ utterances, processes them to build a synthetic voice and provides a web service API to make the voice available to real-time applications through a scalable cloud-based processing platform. All the procedures are automated to avoid human intervention. We present experimental comparisons of voices built using the system.
منابع مشابه
Automatic Building of Synthetic Voices from Audio Books
Current state-of-the-art text-to-speech systems produce intelligible speech but lack the prosody of natural utterances. Building better models of prosody involves development of prosodically rich speech databases. However, development of such speech databases requires a large amount of effort and time. An alternative is to exploit story style monologues (long speech files) in audio books. These...
متن کاملExperiments with Unit Selection Speech Databases for Indian Languages
This paper presents a brief overview of unit selection speech synthesis and discuss the issues relevant to the development of voices for Indian languages. We discuss a few perceptual experiments conducted on Hindi and Telugu voices. 1 Role of Language Technologies Most of the Information in digital world is accessible to a few who can read or understand a particular language. Language technolog...
متن کاملVoice Conservation and TTS System for People Facing Total Laryngectomy
The presented paper is focused on the building of personalized text-to-speech (TTS) synthesis for people who are losing their voices due to fatal diseases. The special conditions of this issue make the process different from preparing professional synthetic voices for commercial TTS systems and make it also more difficult. The whole process is described in this paper and the first results of th...
متن کاملData Selection and Adaptation for Naturalness in HMM-Based Speech Synthesis
We describe experiments in building HMM text-to-speech voices on professional broadcast news data from multiple speakers. We build on earlier work comparing techniques for selecting utterances from the corpus and voice adaptation to produce the most natural-sounding voices. While our ultimate goal is to develop intelligible and natural-sounding synthetic voices in low-resource languages rapidly...
متن کاملText to speech in new languages without a standardized orthography
Many spoken languages do not have a standardized writing system. Building text to speech voices for them, without accurate transcripts of speech data is difficult. Our language independent method to bootstrap synthetic voices using only speech data relies upon cross-lingual phonetic decoding of speech. In this paper, we describe novel additions to our bootstrapping method. We present results on...
متن کامل